Squeeze-and-Breathe Evolutionary Monte Carlo Optimisation with Local 
Search Acceleration and its application to parameter fitting 
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Abstract 

Motivation: Estimating parameters from data is a Icey stage of the modelling process, particularly in 
biological systems where many parameters need to be estimated from sparse and noisy data sets. Over 
the years, a variety of heuristics have been proposed to solve this complex optimisation problem, with 
good results in some cases yet with limitations in the biological setting. 

Results: In this work, we develop an algorithm for model parameter fitting that combines ideas from 
evolutionary algorithms, sequential Monte Carlo and direct search optimisation. Our method performs 
well even when the order of magnitude and/or the range of the parameters is unknown. The method 
refines iteratively a sequence of parameter distributions through local optimisation combined with partial 
resampling from a historical prior defined over the support of all previous iterations. We exemplify 
our method with biological models using both simulated and real experimental data and estimate the 
parameters efficiently even in the absence of a priori knowledge about the parameters. 
Availability: Matlab code available from the authors upon request. 

1 Introduction 

The increasing drive towards quantitative technologies in Biology has brought with it a renewed interest 
in the modeling of biological systems. Models of biological systems and other complex phenomena are 
generally nonlinear with uncertain p arameters, many of which are often unknown and/or unmeasurable 
(JAlonl . 120071 : lEdelstein-Keshetl . Il988l ) . Crucially, the values of the parameters dictate not only the q uan- 



titative but also the qualitative behaviour of such models ( Brown and Sethnal . 120031 : IStrogata . 1 19941 ). A 



fundamental task in quantitative and systems biology is to use experimental data to infer parameter values 
that minimise the discrepancy between the behaviour of the model and experimental observations. The 
parameters thus obtained can then be cross-validated against unused data before employing the fitted 
model as a predictive tool (JAlonl . 120071 ). Ideally, this process could help close the modelling-experiment 
loop by: suggesting specific experimental me asurements: identifying r elevant parameters to be measured : 



or di scriminating between alternative models (jGutenkunst et a/.l . 120071 : iToni and Stumpi l2009l : lYates et al 

booth . 

The probleixL of parameter estimation and da ta fitting is classically posed as the minimisation of a 
cost function (i.e., the error) (Gershenfeldj, Il999l ). In the case of overdetermined linear systems with 



quadratic error functions, this problem leads to least-square solutions, convex optimisations that can be 
solve d efficiently and globally ba sed on the singular value decomposition of the covariance matrix of the 
data ( Lawson and Hansoru . ll995l ). However, data fitting in nonlinear systems with small amounts of data 
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rema ins difficult, as it usually leads to non-convex optimisations with many local minima (JBrewer et all 
20081 ). 

A classic case in biological modeling is the description of the time evolution of a system through ordinary 
differential equations (ODEs), usually based on mechanistic functi onal forms. Examples incl ude models 



of bio chemical reactions, infectious spread and neuronal dynamics (lAnderson and Mavl. Il992l: Edelstein 



Keshet. Il988l ). Typically, optimal parameters of the nonlinear ODEs must be inferred from experimental 
time courses but the associated optimisation is far from straightforward. Standard optimisation techniques 
that require an explicit cost function are unsuitable for this problem due to the diff iculty to obtain full 
analytical solutions for nonlinear ODEs ( Brown and Sethna ^ 2003 ^ Chen et al. ^ 2010l: Papachristodoulou 
and Recht, 120071 ). Spline-b ased methods , which approximate the solution though an implicit integration 
of the differential equation ( Brewer et al\ . l2008l ). require linearity in the parameters and are therefore not 
applicable to models with nonlinear parameter depende ncies, e . g. Mi chaelis-Menten and Hill kinetics. 
I inplicit techniques, such as direct search methods ( Powell ^ 19981) . Simulated Annealing f Kirkpatrick 



e.t al., 1983), Evolut ionary Algorithms ( Mitchelll . 119971 : iRunarsson and Yad . l200d ) or Sequential Monte Carlo 
( Sisson et all 120071 ). do not require an explicit cost function. However, if as is usually the case, the cost 



function is a complicated (hyper) surface in parameter space with many local minima, gradient and direct 
search methods tend to get trapped in local minima due to their use of local information. Although still 
a local method. Simulated Annealing alleviates some of the problems related to local minima through the 
use of stochasticity. However, this comes at the cost of high computational overhead and slow convergence 
and, yet, with no guarantee of finding the global minimum. 

Instead of an optimisation based on local criteria. Evolutionary Algorithms (EA) produce an ensemble 
of possible answers and evolve the m global l y through random mutation and cross-over follo wed by ranking 
and culling of the worst solutions ( Mitchelll . 119971 : IRunarsson and Yad . l200d : ISchwefell . Il995 ). This heuristic 



has been shown to provide an efficient protocol for parameter fitting in the life sciences ( Moles et all |2003| : 



Zi and Klippl . l200d ) . However, EA methods can be inefficient when the feasible region in parameter space 
is too large, a case typical of models with large uncertainty in the parameters. 



Probabilistic methods, such as Sequential Monte-Carlo (SMC) (ISisson et gLl . l2007l ) . propose a different 
conceptual framework. Rather than finding a unique optimal parameter set, SMC maps a prior probability 
distribution of the parameters onto a posterior constructed from samples with low errors until reaching a 
converged posterior. Recently, SMC has been co mbined with App roximate Bayesian Computation (ABC) 



and applied to data fitting and model selection (JToni et a/.l . l2009l ) . However, methods such as ABC-SMC 



are not only computationally expensive but also require that the starting prior include the true value of 
the parameters. This requirement dents its applicability to many biological models, in which not even the 
order of magnitude of the parameters is known. In that case, the support of the starting priors must be 
made overly large (leading to extremely slow convergence) in order to avoid the risk of excluding the true 
parameter value from the search space. 

In this work, we present an optimisation algorithm for data fitting that takes inspiration from EA, SMC 
and direct search optimisation. Our method iterates and refines samples from a probability distribution 
of the parameters in a 'squeeze-and-breathe' sequence. At each iteration the probability distribution is 
'squeezed' by the consecutive application of local optimisation followed by ranking and culling of the local 
optima. The parameter distribution is then allowed to 'breathe' through a random update from a historical 
prior that includes the union of all past supports of the solutions (Fig. [T]). This iteration proceeds until 
convergence of the distribution of solutions and their average error. A key feature of the algorithm is the 
accelerated step-to-step convergence through a combination of local optimisation and of culling of local 
solutions. Importantly, the method can also find parameters that lie outside of the range of the initial 
prior, and can deal with parameter values that extend across several orders of magnitude. We now provide 
definitions and a full description of our algorithm and showcase its applicability to different biological 
models of interest. 



2 Algorithm 

2.1 Formulation of the problem 

Let X(i) = [xi{t), . . . , Xd{t)] denote the state of a system with d variables at time t. The time evolution of 
the state is described by a system of (possibly nonlinear) ODEs: 

X = /(X,t;0). (1) 

Here 6 = [^i , . . . , ^^r] is the vector of N parameters of our model. 

The experimental data set is formed by M observations of some of the variables of the system: 

p = |x(t,)|i = l,...,M}. (2) 

Ideally, M > 2N + 1 since 2N + 1 experiments are enough f or unequivoca l identification of an ODE model 



with N parameters when no measurement error is present (jSontagl . |2002| ) 
The cost function (i.e., the error) to be minimised is: 



M 



Evi9) = Y,\\Mt^■,^)-Mt^) , (3) 

i=l 

where ||-|| is a relevant vector norm. A standard choice is the Euclidean norm (or 2-norm) which corresponds 
to the sum of squared errors: 

M d' _ 2 

where we assume that d' variables are observed. The cost function Ej) : W^ — )■ R_|_ maps a A^-dimensional 
parameter vector onto its corresponding error, thus quantifying how far the data and the model predictions 
are for that particular parameter set. 

The aim of the data fitting procedure is to find the parameter vector 6** that minimises the error 
globally subject to restrictions dictated by the problem of interest: 

0** = nimEx>{0), subject to constraints on 6. (5) 



2.2 Definitions 

• Data set: D, a set of M observations, as defined in Eq. ([2]). 

• Parameter set: = [^i, . . . , 0jv] € M^. Due to the nature of the models considered, 6i > 0, Vi. 

• Objective function: Ex){0), the error function to be minimised, as defined in Eq. @. 

• Set of local minima of Ev{0): M = {0* \ Ev{0*) < Ev{0), 
MO e M{0*)} where M{0*) is a neighbourhood of 0* . 

• Global minimum of Et:)[6): 0**, a parameter set such that Ex>{0**) < Ex){9), MO. Clearly, 0** G M. 



Local minimisation mapping: L : Mlj: — )■ M. Local minimisation maps onto a local minimum: 

L{e) = 6* em. 

Ranking and culling of local minima: {0'}f = TZCb ({^}i)- This operation ranks J parameter sets 
and selects the B parameter sets with the lowest Ex>- 

Joint probability distributions of the parameters at iteration k: TTk{0) (prior) and Wk{0) (posterior). 



Algorithm 1 Squeeze-and-Breathe optimisation. 



Set running parameters of algorithm: S, J G N, Pm £ [0, 1], Tol 
Choose initial priors ttq{0) and Co(^)- 
Set 7^0 = and A: ^ 1. 
repeat 

Let Uk = 'Hk-i- 

Simulate J points from Trk~i{0) through re-population. 

for £ = 1 ^- J do 

Obtain local minimum 0| = L{6i). 
Store the pair [0|,£'x)(0^)] in Hk- 

end for 

Rank and cull the set of local minima: Tik = TICb {T~Lk) 

Define the posterior Wk{0) from the sample T-Lk- 

Update Cfc(^) from C,k-i{!S) and WkiO). 

Update the prior 7rfc(0) ~ pmVUk{9) + (1 - Pm)Ck{9)- 

k-^k + l. 
until (t)k < Tol and MW {zuk{0),Wk^i{6)) = 



Marginal probability distribution of the i component of 6: For instance, 7r(0j) = J 7t{6) Yirj^i^^r- 
Historical prior at iteration k: Cfc(^) = ni=i Cfc(^i) where 

Ck{0i) ~ U (min Ok{0^)) , max (3fc(^^))) • (6) 

Here U{a, b) is a uniform distribution with support in [a, b] and '5k{Gi) = C/7-i ^ '^k ^^ ^^^ union of 
the supports of vjk{0i) and Cfe-i(^j)- 

Update of the prior at iteration k: iTkiO) = ^1=1 '^ki^i) with 

-n-kidi) ~ PmWk{Oi) + (1 - Pm)Ck{di), (7) 

that is, a convex mixture of the posterior and the historical prior with weight pm.. 

Re-population: Obtain population of J random points simulated from the prior Tik-iiO). 

Convergence criterion for the error: The difference between the means of the errors of the posteriors 
in consecutive iterations is smaller than the pre-determined tolerance: 



0fc = Ev{nJk-im - EvizukiO)) < Tol. (8) 

• Convergence criterion for the empirical distributions: The samples of the posteriors in consecutive 
iterations are indistinguishable at the 5% significance level according to the nonparametric Mann- 
Whitney rank sum test: 

MW{wk{e),wk-i{e)) = o. (9) 

2.3 Description of the algorithm 

Algorithm [1] presents the pseudo-code for our method using the definitions above. The iterations 
produce progressively more refined distributions of the parameter vector. At each iteration k, a population 
simulated from the prior distribution TTk~i{0) is locally minimised followed by ranking and culling of the 
local minima to create a posterior distribution Wk{d) (squeeze step). This distribution is then combined 
with an encompassing historical prior to generate the updated prior TTk{0) (breathe step). The iteration 
loop terminates when the difference between the mean errors of consecutive posteriors is smaller than the 
tolerance and the samples of the posteriors are indistinguishable. We now explain these steps in detail 
(Fig. d]) through the BPM model (see Sec. [3l^ . 
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Figure 1: (Colour online) Steps of Algorithm [T] exemplified through the BPM model (fTO|) . A: The problem is defined 
by the data set, the model and the error function to be minimised. Note the rugged landscape of the error function 
in the parameter plane [a, /3), with many local minima. B: In the first iteration, wc simulate J points in parameter 
space from the uniform initial prior 'Kq{0) (grey squares, grey histograms) which arc then minimised locally with 
a Nelder-Mead algorithm L{6) (blue triangles, blue histograms). The local optimisation aligns the parameter sets 
onto the level curves of E-d. C: The B best local minima (top, light blue squares) are selected and considered to be 
samples from the posterior distribution (bottom, light blue histograms). D: Convergence of the error of the samples 
(top, local minima in blue, B lowest in light blue) and of the posterior distributions (bottom, light blue) arc checked 
against the errors of the sample (top, in grey) and the priors (bottom, in grey). E: If convergence is not achieved, 
the historical prior is updated (previous historical prior in red updated to light blue) and a new set of J points are 
simulated from the posterior with probability Pm and from the historical prior with probability 1 —pm (grey squares). 
This new sample is fed back to the local minimisation step B . F: The algorithm stops when convergence is reached 
(after nine iterations, in this case) providing an optimal parameter set 0^ (top, time course of optimal model in blue) 
and the sequence of optimised posteriors at each iteration (bottom). 



1. Formulation of the optimisation: The data set "D and the model equations parameterised by ahow 
us to define an error function Ex>{0) whose global minimum corresponds to the best model. 

In our illustrative example, the BPM model (|1U|) has the parameter vector 9 = [a, f3] and the error 
function is depicted in Fig. [T]A.. The global optimisation on the rugged landscape of this function is 
computationally hard. 

2. Initialisation: 

• Set the running parameters of the algorithm: the size of the simulated population, J; the size 
of the surviving population after culling, B; the update probability, pm', and the tolerance, Tol. 
In this example, J = 500, B = 50, p^ = 0.95 and Tol = W~^. 

• Choose 7ro(0), the initial prior distribution of the parameter vector. 

In this case, we take a and /3 to be independent and uniformly distributed: 'Kq{0) ~ C^(0, 100) x 
C/(0, 100). 

• Initialise Co(^) = '''"o(^)) the historical prior of the parameters. 

• Simulate J points from vro(0) to generate the initial sample {^q}/- 

3. Iteration (step k): Repeated until termination criterion is satisfied. Figure [T] shows the first iteration 
of our method applied to the BPM example. 

(a) Local minimisation: Apply local minimisation to the simulated parameters from the 'prior' 
{6k-i\i and map them onto local minima of Ex>{9) to generate {L(0fc_i)}/ € M. 
Here we use the Nelder-Mead simplex method ( Nelder and Mead . 1 19651 ). though others can be 



used. Figure [IB shows the simulated points from 7ro(0) (grey squares) and its corresponding 
histograms (in grey). After local minimisation, this sample is mapped onto the dark blue 
triangles in Fig. [TJB (histograms in dark blue). Note how the local minima align with the level 
curves of Ejy with a markedly different distribution to the uniform prior. Note also that many 
of the optimised values of a lie outside the range of the prior (0, 100) and are now distributed 
over the interval (0,200). On the other hand, the values of (3 have collapsed inside (0, 1). 

(b) Ranking and culling: Rank the J + B local minima from the k — 1 and k iterations, select the 
B points with the lowest Ej) and cull (discard) the rest: 



7^Cs({L(0,_l)}fu{0L}f) = {0l}f. 



Denote the best parameter vector of this set as O], = min ( {0}^}i I . We consider {0^}f to be a 

sample from the optimised ('posterior') distribution, Wk{0). 

The B = bO best parameter sets are shown (light blue squares) in Fig.[Tp (light blue histograms). 

Termination criterion: Check that the difference between the mean errors of the consecutive op- 
timised samples is smaller than the tolerance: (j)^ < Tol. We also gauge the 'convergence' of the 
posteriors through the Mann- Whitney (MW) test to determine if the samples from consecutive 
posteriors are distinguishable: 



where MW is a 0-1 flag. The MW test gives additional information about the change of the 
optimised posteriors from one iteration to the next. 

Figure [Tp shows the convergence check for the first iteration of the BPM model: (i) top, errors 
of the sampled prior (grey, left) with errors of the local minima (dark blue, right) and the B 
surviving points (light blue); (ii) bottom, histograms of the prior (grey) and the posterior (light 
blue). Clearly, in this iteration neither the error nor the distributions have converged so the 
algorithm does not stop. 
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Table 1: Results of the fitting of the BPM model with Algorithm [T] smallest error of iteration k; the best values a]. 
and (3l; whether the distributions have converged; and the difference of the mean errors of the optimised population. 



(d) Update of historical prior and generation of new sample: If convergence is not achieved, update 
the historical prior Cfc(^) as a uniform distribution over the union of the supports of the existing 
historical prior and the calculated posterior ([6|). Equivalently, the support of the historical prior 
extends over the union of the sequence of all historical priors {(!^q{6),... Xk-ii^)} and of all 
posteriors {'nji{6), . . . ,vjk{0)}- 

As shown in Fig. [TJ5 for the BPM example, the marginal of the historical prior for a is expanded 
to ^^(0, 200), since the optimised parameter sets have reached values as high as 200. Meanwhile, 
the /3 marginal of the historical prior remains unchanged as U{0, 100) because there has been 
no expansion of the support. 

The historical prior is used to mutate the updated prior before the next iteration by constructing 
a weighted mixture of the posterior and the historical prior with weight pm, as shown in ([7|). We 
re-populate from this updated prior by simulating from the posterior with probability pm = 0.95 
and from the historical prior with probability (1 — Pm) to generate the new sample {^fc}/ and 
iterate back. 

Figure [TJil shows the sample of J points simulated from the new prior. The a-components of 
most points are between 100 and 200 and the /3-components are between 0.1 and 1.0, but there 
are a few that lie outside the support of the posterior. The process in panels B. C, D, and E of 
Fig. [T]is iterated for this new set of points. 

4. Output of the algorithm: When the convergence criteria have been met, the iteration stops at iteration 
k* and the last 0L is presented as the optimal parameter set for the model. We can also examine 



the sequence of optimised parameter distributions {tuii 

(Fig.EF). 



6) 



, rofc*(^)} obtained for all iterations 



3 Application to biological examples 



We apply our algorithm to three biological examples of interest. The first two correspond to simulated 
data from models in the literature, while in the third example we apply our algorithm to unpublished 
experimental data of the dynamical response of an inducible genetic promoter constructed for an application 
in Synthetic Biology. 



3.1 BPM model of gene-product regulation 

The Bliss-Painter-Marr (BPM) model ( Bliss et aLl . ll982l ) describes the behaviour of a gene-enzyme-product 



control unit with a negative feedback loop: 

E = m-E), (10) 

Here, R, E and P are the concentrations (in arbitrary units) of mRNA, enzyme and product, respectively. 
The degradation rate of the product has an explicit time dependence, which in this case has the form of a 
ramp saturation: 

' 5 + 0.2t < t < 50, 



^^*^ ' 15 t>50. 

The model represents a gene that codes for an enzyme which in turn catalyses a product that inhibits the 
transcription of the gene. This se lf-inhibition can l ead to oscillations, which have been shown to occur in 



the tryptophan operon in E. coli ([Bliss et al\ . Il982l ) 



We construct a data set from simulations of this model with 0real = [cK)/3] = [240,0.15] and initial 
conditions -R(O) = E{Q) = -P(O) = 0. The data set T> consists of 10 measurements of R{t) at particular 
times with added gaussian noise drawn from7\A(0, 15^) (Table[3]). The error function Ex>{0) ([4]) corresponds 
to a non-convex optimisation landscapqj: a complex rugged surface with many local minima making global 
optimisation hard (Fig. [l]A.). 

We use Algorithm [1] to estimate the 'unknown' parameter values from the 'measurements' of R, as 
illustrated in Sec. 12.31 and Fig. [TJ Feigning ignorance of the true values, we choose a uniform prior distri- 
bution with range [0, 100] for both parameters: vro(0) ^ [t^(0, 100), C^(0, 100)]. The rest of the paramters 
are set to: J = 500, B = 50, pm = 0.95 and Tol = 10~^. Note that the true value of a falls outside of 
the assumed range of our initial prior, while the range of /3 in our initial prior is two orders of magnitude 
larger than its true value. This level of uncertainty about parameter values is typical in data fitting for 
biological models. 

Figure [1] highlights a key aspect of our algorithm: the local minimisation can lead to local minima 
outside of the range of the initial prior. Furthermore, our definition of the historical prior ensures that 
successive iterations can find solutions within the largest hypercube of optimised solutions in parameter 
space. In this example, the algorithm moves away from the f/(0, 100) prior for a and finds a distribution 
around 240 (the true value) after three iterations, while in the case of /3, the distribution collapses to 
values around 0.15 after one iteration. Although the algorithm finds the minimum 6^ after 5 iterations, 
the algorithm is terminated after 9 iterations, when the posterior distributions are similar (according to 
the MW test) and the mean errors have also converged (Table [1]). The estimated parameters for this noisy 
data set are 0^, = [251.7189, 0.1530]. In fact, the error of the estimated parameter set is lower than that of 
the real parameters: Ex>{0^) = 26.65 < E-piOrcai) = 28.26, due to the noise introduced in the data. When 
a data set without noise is used, the algorithm finds the true value of the parameters to 9 significant digits 
(not shown). 

3.2 SIR epidemics model 

Susceptible-Infected-Recovere d (SIR) models are widely used in epidemiology to describe the evolution of 
an infection in a population ( Anderson and Mavl . Il992l ). In its simplest form, the SIR model has three 



^We thank Markus Owen of the University of Nottingham for suggesting this example. 
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Figure 2: (Colour online) A: Time courses of the SIR model ([TT]) . Green squares are simulated 'data' points 
(Table g]) and bold blue lines are the model fit with the best parameters a^ = 1.0726, 7* = 0.7964, S = 0.4945, and 
v^ = 0.9863 and the best fit initial conditions Sq = 19.1591, /g ~ 10.3016, and R\ = 0.3861. Red dashed lines use 
the best fit parameters and the real initial conditions. The minimum error is Ex>{6 ) = 1.7297. B: Histogram of the 
values of the 50 best parameters and initial conditions of the model obtained after convergence at six iterations. C: 
Convergence of the error of the optiiuiscd samples at every iteration relative to the final error. 



variables: the susceptible population S, the infected population / and the recovered population R: 

S = a-{-fI + d)S, 
i=(^jS -V- d)I, 
R = vl - dR. 



(11) 



The first equation describes the change in the susceptible population, growing with birth rate a and 
decreasing by the rate of infection 7IS' and the rate of death dS. The infected population grows by the 
rate of infection 7/5' and decreases by the rate of recovery vl and the rate of death dl. The recovered 
population grows by the rate of recovery vl and decreases by the death rate dR. Here we use the same 
form of the equations as iToni et al\ (J2009l ) . 

The data generated from the model (jlip (see Tabled]) was obtained directly from iToni et ali ( 20091 ). 
Hence the original parameter values were not known to us and further we assumed the initial conditions 
also to be unknown and fitted them as parameters. We used Algorithm [1] to estimate a, 7, v, and d and 
initial conditions ^o, Iq, and Rq. The prior marginal distributions for all parameters were set as ^(0, 100). 
The other parameters were set to: J = 1000, B = 50, pm = 0.95 and Tol = 10~^. The algorithm converged 
after six iterations. Figure [2K shows the prediction of the model (fTT]) with the best parameters estimated 
by our algorithm. The fit is good with little difference between the curves obtained using the real initial 
conditions and the ones estimated by our method. 

The posterior distributions after six iterations of the algorithm are shown on Fig. [2jB- The errors ob- 
tained after each local minimisation in a decreasing order on each iteration are shown on a semilogarithmic 
scale in Fig. [2lC. We can observe how the errors decrease several orders of magnitude over the first three 
iterations and converge steadily during the last three iterations until 0^ < Tol. 

3.3 An inducible genetic switch from Synthetic Biology 

The use of inducible genetic switches is widespread in synthetic biology and bioengineering as building 
blocks for more complicated gene circuit architectures. An example is shown schematically in the inset of 
Fig. [3K- This environment-responsive switch is used to control the expression of a target gene G (usually 
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Figure 3: A: (Colour online) Inset: An inducible genetic switch consisting of Pi, a negatively regulated environment- 
responsive promoter. The repressor Ri promoted by P regulates Pi. The switch is responsive to an exogenous 
inducer Ji , which binds to Ri to relieve its repression on Pi and to turn on the transcription of the downstream 
target gene, such as a gfp. The ribosome binding site (rbs) is used to tunc the translation efficiency of the down- 
stream gene. Plot: Fluorescent response of the switch with g/p-34 to different doses of IPTG (circles). Stationary 
solutions of Eq. (|12p using the parameters obtained with Algorithm [1] (solid lines). B: Time course of the fluores- 
cent response of the switch with gfp-SA to several doses of IPTG (circles) and time-dependent solutions of Eq. (fT2|) 
using the parameters obtained with Algorithm [T] (solid lines) . Similarly good fits were obtained for responses to 
h = 0.0063, 0.0016, 0.0004, and 0.0 mM (not shown). 



tagged with green fluorescent protein or gfp) through the addition of an exogenous small molecule /i (e.g., 
isopropyl thiogalactopyranoside or IPTG). The input-output behaviou r of th is system can be described by 
the following ordinary differential equation ( Alonl . 120071 : ISzallasi et all |2006| ) : 



G = aki + 



kill' 



Kf 1 + II 



n\ 



dG. 



fl2) 



Here, aki is the basal activity of the promoter Pi and dG is the linear degradation term. The second term 
is a Hill function that models the cooperative transcription activation in response to the inducer /i with 
maximum expression rate /ci, constant Ki and Hill coefficient rii. 

The lacI- Piar switch has bee n characterised experimentally in response to different doses of IPTG in 



WaneJ ()2010l ) ; IWang et al\ (J201ll ) . Equation (fT2|) can be solved explicitly and one can use nonlinear least 
squares and the analytical solution to fit data at stationarity (i.e., at long times) and estimate q, ni, Ki, 
and the rati o k^ Id. These estiin ates have been obtained assuming equilibrium [G = 0) and initial condition 
G(0) = bv lWang et IHI (l201lh (Table [2]). 

In fact, the experiments measured time series of the expression of G every 20 minutes from t = 140 to 360 min. 
for different doses of inducer h = 0.0,3.9 x 10-^,1.6 x 10-^,6.3 x 10"^2.5 x 10"^, 0.1, 0.4, 1.6,6.4, 12.8 
mM, with two different reporters {gfp-30 and gfp-Si). See Tab les [5] and [6l Instead of assuming equilibrium 
and using only the data for t > 300 min as done previously (jWang et all l201ll ) , we apply Algorithm [T] 
to all the data with the full dynamical equation (fT2|) to estimate 9 = [a,ki,ni,Ki,d]. In this case, we 
used initial priors C/(0, 1) for a and ni; and U{0, 20) for ki, Ki and d. The other parameters were set to: 
J = 1000, B = 50, pm = 0.95, and Tol = 10"^ 

Our algorithm converged after five iterations to the parameter values in Table [2l The parameter 
estimates provide good fits to both the time courses (Fig. [3lB) and to the dose response data (Fig. [3K). 
The values of K\ and nj obtained here are similar those obtained in IWangI ( 2010l ) by using only stationary 
data. This is reassuring since these parameters are related to the dose threshold to half maximal response 
and to the steepness of the sigmoidal response, both static properties. On the other hand, the values of a 
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Wang f2010) 


Algol 


ithm [2 


Parameter 


gfp-^^ 


5/P-34 


5/P-30 


gfp-M 


a^ 


0.0012 ±0.027 


1.4720 X 10-y 


0.0043 


0.0024 


k\ 


N/A 


N/A 


76.1354 


63.6650 


n\ 


1.3700 ±0.270 


1.3690 ±0.021 


1.4832 


1.3879 


Kl 


0.2280 ±0.039 


0.2590 ±0.021 


0.2467 


0.2641 


s 


N/A 


N/A 


0.0069 


0.0052 


k\/S 


9456 ± 487 


7648 ± 152 


10983.34 


12163.04 



Table 2: Parameter values obtained from gfp-30 and gfp-SA data. In IWana (2010|), only the steady state solution 
was used. Hence only the ratio of ki and d can be estimated. 



and the ratio ki/d differ to some extent due to the (imperfect) assumption in I Wang) (120101 ) that steady state 
had been reached at t = 300 min. As Fig. [3jB shows, G is not at steady state then. Hence the parameter 
values obtained with our method should give a more faithful representation of the true dynamical response 
of the switch. 



4 Discussion 



In this work, we have presented an optimisation algorithm that brings together ingredients from Evolu- 
tionary Algorithms, local optimisation and Sequential Monte Carlo. The method is particularly useful 
for determining parameters of ordinary differential equation models from data. Our approach can also be 
used in other contexts where an optimisation problem has to be solved on complex landscapes, or when 
the objective function cannot be written explicitly. The algorithm proceeds by generating a population 
of solutions through Monte Carlo sampling from a prior distribution and refining those solutions through 
a combination of local optimisation and culling. A new prior is then created as a mixture of a historical 
prior (which records the broadest possible range of solutions found) and the distribution of the optimised 
population. This iterative process combines a strong concentration of the Monte Carlo sampling through 
local optimisation with the possibility that solutions can be found outside of the initial prior. 

We have illustrated the application of the algorithm to ODE models of biological interest and have 
found it to perform efficiently. The algorithm also works well when applied to larger problems with tens of 
parameters in a signal transduction model (paper in preparation). The efficiency of the algorithm hinges 
on selecting appropriate running parameters. For instance, the number of samples from the prior J should 
be large enough to allow for significant sampling of the parameter space while small enough to limit the 
computational cost. We have found that simulating J = 350 — 500 points in models of up to 10 parameters 
and keeping the best 15% of the local minima leads to termination within fewer than 20 iterations. In 
our implementation, the Nelder-Mead minimisation is capped at 300 evaluations. These guidelines would 
result in 150,000 evaluations of the objective function per iteration. Therefore our method can become 
computationally costly if the objective function is expensive to evaluate, e.g. in stiff models that are difficult 
to solve numerically. In essence, our algorithm proposes a trade-off: fewer but more costly iterations. It is 
important to remark that, as with any other optimisation heuristic for non convex problems, there are no 
strict guarantees of convergence to the global minimum. Therefore, it is always advisable to run the method 
with different starting points and different settings to check for consistency of the solutions obtained. 

The generation of iterative samples of the parameters dr aws inspiration from Monte Carlo methods 
( Sisson et all 120071 : iToni and Stumpi l2009l : iToni et all l2009l ) but without pursuing the strict guarantees 
that the nested structure of the distributions in ABC-SMC provides. Our evolutionary approach adopts 
a highly focused Monte Carlo sampling driven by a sharp local search with culling. Hence our iterative 
procedure generates samples that only reflect properties of the set of local minima (up to numerical cutoffs) 
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without any focus on the global convergence of the distributions. As noted by iToni et al\ ((20091), the 
distributions of the parameters (both their sequence and the final distributions) give information about the 
sensitivity of the parameters: parameters with narrow support will be more sensitive than those with wider 
support. Future developments of the method will focus on establishing a suitable theoretical framework 
that facilitates its use in model selection. Other work will consider the possibility of incorporating a 
stochastic ranking strategy in the selection of solutions, similar to that present in the SRES algorithm 



stocnastic rauKing strategy m tne selection oi solutions, simiiar to tnat present m tne bnHjb aigoritnm 
( Runarsson and Yad . I2OO0I ). in order to solve more general optimisation problems with complex feasible 



regions. 
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A BPM model data 



t 


R 








20 


43.5373 


40 


13.3667 


60 


140.8903 


80 


29.2816 


100 


108.1722 


120 


19.0093 


140 


75.0065 


160 


14.4018 


180 


50.4473 


200 


217.1082 



Table 3: BPM data. 



Tabic [3] shows data obtained from a simulation of the BPM model from equations (6) using parameters a ~ 240 
and (3 ~ 0.15, initial conditions i?(0) = 0, £'(0) = 0, -P(O) = 0, and adding random noise sampled from a A^(0, 15^) 
distribution. Only the data for variable R was obtained. 



B SIR model data 

Table |4] shows data for the SIR model generated from equations (8) using initial conditions 5'(0 ) = 20, 7(0 ) = 10 , 
and -R(O) = with added random noise sampled from a N{0, 0.2^) distribution as appears in Ref. iToni et all (|2009l ). 



C Genetic switch data 

Tab les [5] and [6] show 1 
and I Wang et aU (|201l[ ) 



Tab les [5] and [6] show the fluorescent response of IPTG-induced genetic switches described in Ref. IWangl ( 2010l ) 
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t 


s 


I 


R 


0.6 


0.12 


13.17 


9.42 


1.0 


0.12 


7.17 


11.19 


2.0 


0.10 


2.36 


10.04 


3.0 


0.38 


0.92 


6.87 


4.0 


1.00 


0.62 


4.45 


5.0 


1.20 


0.17 


3.01 


6.0 


1.46 


0.28 


1.76 


7.0 


1.38 


0.10 


1.29 


8.0 


1.57 


0.03 


0.82 


9.0 


1.46 


0.29 


0.52 


10.0 


1.25 


0.10 


0.23 


11.0 


1.56 


0.22 


0.20 



Table 4: SIR data. 
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t 


OmM 


0.0004mM 


0.0016mM 


0.0063mM 


0.025niM 


O.lmM 


0.4mM 


1.6mM 


6.4mM 


12.8mM 



































140 


88.6 


177.8 


174.4 


197.8 


210.4 


1043.6 


3945.8 


5971 


6643.8 


6521.8 


160 


120.2 


156.4 


160.6 


165.6 


209.8 


1300.8 


4695.2 


6768.4 


7361.8 


7513.8 


180 


66.6 


96.4 


94.6 


126.4 


171.6 


1438.4 


5238.8 


7465.2 


7801 


8002.4 


200 


42.8 


72.2 


76.2 


88 


134.2 


1578 


5658 


7914 


8458 


8542.8 


220 


37 


64.8 


61.2 


55 


135.8 


1667 


5799.6 


8380.2 


8976 


8914.8 


240 


39.6 


56.6 


60.4 


65.8 


142.8 


1758.6 


6108.6 


8601.4 


9172.6 


8957 


260 


36.2 


47.6 


62 


69.8 


143.6 


1859.8 


6104 


9041.8 


9528.6 


9252.8 


280 


50.8 


55.6 


58.2 


74.2 


170.6 


1968.2 


6554.4 


9071.6 


9449 


9018.4 


300 


39.6 


51 


40.8 


60.2 


197.8 


2143.4 


6452.2 


8396.2 


9269.2 


9261.2 


320 


50.4 


62.8 


65.6 


82 


273.6 


2317.8 


6880.8 


8941.2 


9887.6 


9982.8 


340 


53.8 


71.4 


71 


88.6 


296 


2512.8 


7052.2 


8972.8 


9694.6 


10108 


360 


45.6 


66 


61.6 


69.2 


340.8 


2639.2 


7047.8 


9103.6 


9911 


10018.4 


t 


OmM 


0.0004niM 


0.0016mM 


0.0063mM 


0.025niM 


O.lmM 


0.4mM 


1.6mM 


6.4mM 


12.8mM 



































140 


215 


163.4 


124.8 


134 


119 


230.4 


721.2 


1001.8 


1095.8 


701 


160 


141.6 


116.6 


95.4 


86 


40 


320.6 


937 


1112.2 


1054 


903.2 


180 


131.6 


112.2 


117.6 


84 


81 


252.2 


825.2 


727.4 


1026.8 


679.2 


200 


69.8 


42.4 


37.8 


39 


44.2 


225.2 


688.4 


829.8 


761.6 


584.6 


220 


55 


58.4 


59 


60.6 


50.4 


169.2 


645.8 


713.6 


739.6 


454 


240 


38.8 


48 


30.8 


43.4 


42.2 


148.8 


366 


418.6 


453.8 


668.2 


260 


42.2 


44 


48.6 


41 


53.8 


152.8 


496.4 


638.4 


547.8 


626.2 


280 


55.2 


54.4 


51.8 


53.6 


76 


257.2 


498.2 


722.2 


889.8 


606.2 


300 


50.4 


57.4 


62 


67.8 


95 


339.8 


447.4 


835.6 


693.2 


602.6 


320 


52.6 


69.6 


78.4 


81.2 


146.8 


385.8 


540.4 


776.4 


1084.2 


580 


340 


57 


60.6 


73.8 


65.6 


144.6 


401.2 


466.4 


396.6 


560.4 


702 


360 


61.6 


73.2 


77.2 


68.6 


151 


400 


374.8 


251 


742 


436.2 



Table 5: gfp30 fluorescence measurements (top) and standard deviations (bottom). 
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149.1 


199.7 
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2682.7 
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160 


96 
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121.6 
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180 


64.3 


178.7 
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32.2 
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43.2 
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13.4 
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55.5 


13 


16.4 


107.1 


1535.8 


4680.4 


6609.6 


7247.2 


7290.1 
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33.2 


46.1 


21.7 


22.1 


129.7 


1675.5 


4958.5 


6949.3 


7620.3 


7631.3 


320 


29.5 


39 


8.7 


22.5 


154 


1824.5 


5122.3 


7053.4 


7642.7 


7645.1 


340 


31.2 


43.2 


19.1 


27.1 


172.2 


1836.2 


5282.7 


7156.9 
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40 


10.9 
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1979.3 


5456.4 
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7899.1 
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140 


89.4 


85.8 


209.2 


120.8 


77.8 


175.8 


383.6 


295.4 


332.6 


382 


160 


59.4 


23 


166.4 


111.6 


40.6 


188.8 


572.2 


391.6 


430.6 


326.2 


180 


31.6 


38.6 


135.4 


51.2 


24.8 


210.6 


597 


370.8 


467.6 


363.8 


200 


45.2 


60.4 


83.2 


65.2 


42 


166 


573.2 


273.6 


341.6 


337 


220 


14 


27.4 


90.2 


51.2 


25 


90 


513.8 


249.6 


234 


145.2 


240 


25.2 


32.2 


53.8 


30.6 


16.2 


70 


475.2 


187.6 


464.8 


168 


260 


14.8 


17.2 


47.4 


23.8 


14.2 


68.8 


511.8 


256 


300.6 


214 


280 


20 


15.4 


46.6 


16.6 


15.8 


70.6 


395.8 


237.6 


313.6 


454.6 


300 


17.8 


17.8 


37.8 


29.8 


29.2 


178.2 


486.6 


383.8 


416.2 


377.2 


320 


21 


21.2 


43 


26.4 


46.8 


216.2 


519.6 


507.4 


674.8 


227 


340 


26 


22.2 


36.8 


25.4 


46.6 


340.8 


495.6 


655.6 


594.2 


299.4 


360 


15.2 


13 


38.4 


8.6 


50 


350.4 


604.8 


434.2 


853.8 


387.8 



Table 6: gfpSi fluorescence measurements (top) and standard deviations (bottom). 
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